Fix whisper #1037

csukuangfj · 2024-06-20T13:54:03Z

Fixes #633

Could you use this PR to test the wave failing to decode?

Please use first the test.py from this PR. You need to re-export the model using the latest export-onnx.py from this PR.

I will fix the C++ code tomorrow.

CC @GaryLaurenceauAva

szaszakgy · 2024-06-20T16:31:05Z

Hi @csukuangfj , thanks for the feedback!
I am able to run test.py on the original recording. It returns a transcription result, which is however chopped, the last 9 words are missing compared to testing on the shared file problem_01.wav, which returns a perfect result. I tested 3 more problem recordings. 2 of them return a transcript, which is chopped compared to original whisper. One of them still returns with the previous failure 'INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'/Expand' Status Message: invalid expand shape' . This recording contains repetitions (self corrections or stuttering), but can be decoded with original whisper without issues.

csukuangfj · 2024-06-20T23:37:51Z

could you share the problematic wav and tell us which model you are using?

szaszakgy · 2024-06-21T09:01:18Z

Hi, I was using the base.en model (exported it first as you requested and run it through test.py), I am attaching the audio for which I still have onnxruntime error. For the others as said, the transcripts returned are trimmed at the point where I had earlier observed the stuck by repeating the same token(s) with previous version. Fangjun Kuang ***@***.***> ezt írta (időpont: 2024. jún. 21., P, 1:38):

…

could you share the problematic wav and tell us which model you are using? — Reply to this email directly, view it on GitHub <#1037 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIM2QEC2GYTK4OSFC6JHLTZINRWJAVCNFSM6AAAAABJUAVT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRG4ZDGNJRGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

thewh1teagle · 2024-08-09T15:25:57Z

Tested with whisper models with DirectML / CPU on Windows with the newly exported models.

tiny.int8 CPU (success)

python .\scripts\whisper\export-onnx.py --model tiny
python .\scripts\whisper\test.py --encoder .\tiny-encoder.int8.onnx --decoder .\tiny-decoder.int8.onnx --tokens tiny-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav
2024-08-09 18:05:44.3137218 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:05:44.3223491 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:05:44.9778383 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:05:44.9866576 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

tiny.int8 DML (success)

python .\scripts\whisper\test.py --encoder .\tiny-encoder.int8.onnx --decoder .\tiny-decoder.int8.onnx --tokens tiny-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav
2024-08-09 18:24:38.9712836 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:24:38.9799328 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:24:39.4824920 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:24:39.4912264 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

medium.int8 CPU (success)

python .\scripts\whisper\export-onnx.py --model medium
 python .\scripts\whisper\test.py --encoder .\medium-encoder.int8.onnx --decoder .\medium-decoder.int8.onnx --tokens .\medium-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav
After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

medium.int8 DML (failed)

python .\scripts\whisper\test.py --encoder .\medium-encoder.int8.onnx --decoder .\medium-decoder.int8.onnx --tokens .\medium-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav       
2024-08-09 18:22:35.7186952 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:22:35.7283108 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:22:40.3298896 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:22:40.3379720 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:22:45.4154322 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running MemcpyToHost node. Name:'Memcpy_token_172' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FF9A58D300E: (caller: 00007FF9A601D211) Exception(3) tid(2f14) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Traceback (most recent call last):
  File "D:\sherpa\sherpa-onnx\scripts\whisper\test.py", line 415, in <module>
    main()
  File "D:\sherpa\sherpa-onnx\scripts\whisper\test.py", line 370, in main
    logits, n_layer_self_k_cache, n_layer_self_v_cache = model.run_decoder(
                                                         ^^^^^^^^^^^^^^^^^^
  File "D:\sherpa\sherpa-onnx\scripts\whisper\test.py", line 154, in run_decoder
    logits, out_n_layer_self_k_cache, out_n_layer_self_v_cache = self.decoder.run(
                                                                 ^^^^^^^^^^^^^^^^^
  File "C:\Users\User\.rye\py\[email protected]\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MemcpyToHost node. Name:'Memcpy_token_172' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FF9A58D300E: (caller: 00007FF9A601D211) Exception(3) tid(2f14) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

csukuangfj added 2 commits June 20, 2024 21:49

Fix whisper

c0df893

fix typos

0070136

csukuangfj mentioned this pull request Jul 22, 2024

feat: add directml support #1153

Merged

thewh1teagle mentioned this pull request Aug 9, 2024

Whisper medium error with DirectML #1240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix whisper #1037

Fix whisper #1037

csukuangfj commented Jun 20, 2024

szaszakgy commented Jun 20, 2024

csukuangfj commented Jun 20, 2024

szaszakgy commented Jun 21, 2024 via email

thewh1teagle commented Aug 9, 2024 •

edited

Loading

Fix whisper #1037

Are you sure you want to change the base?

Fix whisper #1037

Conversation

csukuangfj commented Jun 20, 2024

szaszakgy commented Jun 20, 2024

csukuangfj commented Jun 20, 2024

szaszakgy commented Jun 21, 2024 via email

thewh1teagle commented Aug 9, 2024 • edited Loading

thewh1teagle commented Aug 9, 2024 •

edited

Loading